Aligning Alignments
نویسندگان
چکیده
While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sum-of-pairs objective. We begin a thorough investigation of aligning two alignments under the sum-of-pairs objective with general linear gap costs when either of the two alignments are given in the form of a sequence (a degenerate alignment containing a single sequence), a multiple alignment (containing two or more sequences), or a proole (a representation of a multiple alignment often used in computational biology). This leads to ve problem variations, some of which arise in widely-used heuristics for multiple sequence alignment, and in assessing the relatedness of a sequence to a sequence family. For variations in which exact gap counts are computationally diicult to determine, we offer a framework in terms of optimistic and pessimistic gap counts. For optimistic and pessimistic gap counts we give eecient algorithms for the sequence vs. alignment, sequence vs. proole, alignment vs. alignment, and proole vs. proole variations, all of which run in essentially O(mn) time for two input alignments of lengths m and n. For exact gap counts, we give the rst provably eecient algorithm for the sequence vs. alignment variation, which runs in essentially O(mn log n) time using the candidate-list technique developed for convex gap-costs, and we conjecture that the alignment vs. alignment variation is NP-complete.
منابع مشابه
Optimal Alignment of Multiple Sequence Alignments
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 CHAPTER 1: INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.1 Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
متن کاملAligning Alignments Exactly ( Extended abstract )
A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a strai...
متن کاملA System for Aligning Taxonomies and Debugging Taxonomies and Their Alignments
With the increased use of ontologies in semantically-enabled applications, the issues of debugging and aligning ontologies have become increasingly important. The quality of the results of such applications is directly dependent on the quality of the ontologies and mappings between the ontologies they employ. A key step towards achieving high quality ontologies and mappings is discovering and r...
متن کاملStatistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.
Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods...
متن کاملA Standard for Aligning Mathematical Concepts
Mathematical knowledge is publicly available in dozens of different formats and languages, ranging from informal (e.g. Wikipedia) to formal corpora (e.g., Mizar). Despite an enormous amount of overlap between these corpora, few machine-actionable connections exist. We speak of alignment if the same concept occurs in different libraries, possibly with slightly different names, notations, or form...
متن کامل